Check Model Conformance (Operator Toolbox)
Synopsis
This operator tests if a given ExampleSet is compatible to the training data of a given model.Description
The Test Model Compatibility operator is designed to give the user more control over the testing of a model prior to deployment. When an Apply Model operator is used, it checks the compatibility of the new, unlabeled data with the training data. As every RapidMiner model contain reference information on its training data in the header, this operator performs three tests to determine whether or not the unlabeled data are compatible with the model. The operator examines both the attributes types and the data types in its tests.
Test 1: Attribute Comparison Test – this test determines if the test data set is a subset, equal or a superset of the training data. Given the conditions you specify, the operator will throw an error.
Test 2: Data Type Comparison Test – this test determines whether or not all data types are equal to the ones that exist in the training set. RapidMiner data types follow a specific hierarchy: nominal -> polynominal, binominal, numerical -> integer, real and date_time -> date, and time. The Check Model Compatibility operator can also examine the 'parents' (e.g. it accepts a real data type if training data type was integer) or same 'children' (e.g. it accepts a real data type if the training data type was numerical).
Test 3: Nominal Data Class Comparison Test – this is only applied to nominal attributes. It determines if all classes (e.g. sunny/overcast/rain in the Golf sample data set) of the testing data were also present in the training data. If a nominal class is not present during training, RapidMiner treats it as a missing value. Hence the user can decide if an error should be thrown in case of a non-present value, or if new 'flag' attributes should be generated. These flag attributes can be used later to filter for "non-compatible" examples.
Input
- exa (Data Table)
The ExampleSet you want to test for model compatibility.
- mod (Decision Tree)
The model object used for the tests.
Output
- exa (Data Table)
The ExampleSet delivered from this port is changed by means of the operator. New flag attributes are added.
- mod (Decision Tree)
The input model is passed without changing to the output through this port.
Parameters
- check_attributes If this option is checked, the Attribute Comparison Test is performed. Range:
- check_nominals If this option is checked, the Nominal Data Class Comparison Test is performed. Range:
- fail_on_error If this option is checked, the operator will raise an error if a non-existing nominal is found. Range:
- allowed_attribute_relationships This defines the allowed relationships between the training and testing attribute sets. Range:
- allowed_type_relationships This defines the allowed relationships between the training and testing attribute types. Range:
- generate_flag_attributes If this option is checked, flag attributes indicating missing nominal values are generated for each nominal. Additionally a flag attribute indicating if any flag is true is generated. Range:
Tutorial Processes
Check for model conformance on Golf
In this example we check for model compatibility on the Golf sample data set. We train a decision tree model and test compatibility on another version of the Golf sample data set. In another version we changed two nominal values in the Wind and Outlook attributes. The Test Model Compatibility operator is set to not fail but to create corresponding flag attributes.